Probabilistic Dependency Networks for Prediction and Diagnostics

نویسندگان

  • Narayanan Unny Edakunni
  • Aditi Raghunathan
  • Abhishek Tripathi
  • John Handley
  • Frédéric Roulland
چکیده

Research in transportation frequently involve modelling and predicting attributes of events that occur at regular intervals. The event could be arrival of a bus at a bus stop, the volume of a traffic at a particular point, the demand at a particular bus stop etc. In this work, we propose a specific implementation of probabilistic graphical models to learn the probabilistic dependency between the events that occur in a network. A dependency graph is built from the past observed instances of the event and we use the graph to understand the causal effects of some events on others in the system. The dependency graph is also used to predict the attributes of future events and is shown to have a good prediction accuracy compared to the state of the art. Narayanan Edakunni, Aditi Raghunathan, Abhishek Tripathi, John Handley, Frederic Roulland 2 INTRODUCTION In the field of transportation research, frequently there arise situations where we are interested in modelling the causal relation between different events in a transport network. Example of such an event would be arrival of a bus at a bus stop during a day. Each of these events is associated with a time of occurrence of the event and these events are recurring a particular bus service arrives at a bus stop at around the same time everyday. We can now model different attributes associated with these events like delay of arrival, demand for the bus at a bus stop, waiting time of a bus at a bus stop etc. and use this model to understand the system better and to predict attributes of future events more accurately. These events can be modelled effectively if we have the historical data pertaining to the event. Traditionally, data mining tools like neural networks(1), support vector regression (2) have been used to model events as a function of static factors like demographics, geography etc. However, these methods of modelling do not provide a holistic view of the situation. For instance, a neural network model of congestion as a function of demographics might use the static information of demographics to model and predict the demand at a bus stop at a particular time but fails to take into account the temporal aspect of demand. An alternative to static regression models is to use a time series(3, 4, 5) to model the event. In our running example, the demand at a particular bus stop could be modelled as a function of the time of the day making it a time series and the parameters of the time series are learned from the historical measurements of demand. However, this approach cannot capture the effect of other factors that might influence the value of the time series (like traffic accidents, breakdowns, ripple effect of congestion in other parts of the network). Spatio-temporal models(6, 7, 8, 9) have been proposed in recent times to model entities that are presumed to be affected by the spatial and the temporal properties of the entity. However, most of these methods assume that there exists a continuum of values for the entity being measured. This assumption is violated in many cases; in our running example of modelling demand at a bus stop, the value of demand is valid for discrete locations in space (only points where there is a road) and cannot assumed to hold for any arbitrary point in space. The demand levels at a particular location of the route is typically dependent on discrete points in the network and is a function of the network structure rather than the spatial spread of the events. In case where the spatio-temporal models are able to model discrete spatial points, the modelling complexity is high and has not been successfully adapted to the transportation domain. In this paper, we propose a generalised tool that models discrete observable events in a transportation network as random variables and builds a graphical model(10) over these variables to map the dependency between these variables. The resulting graphical model can be used to predict attributes of future events in the system, given the values of events that have been observed till the time of making the prediction. DEPENDENCY GRAPH Events in a transportation system can be modelled as random variables and the dependency between variables can be modelled as a probabilistic relation amongst the variables. Graphical model is a framework to model and visualize the conditional dependence between random variables. The probability of a random variable conditioned on other random variables in the system can be learned from the observed values of the random variables. In a graphical model, random variables are represented as nodes of a graph with directed edges being the conditional dependence of one variable on another. The model represents the inter-dependency of different random variables and Narayanan Edakunni, Aditi Raghunathan, Abhishek Tripathi, John Handley, Frederic Roulland 3 hence is a valuable framework to understand complex systems built from these variables. The other advantage of a graphical model is that when values of certain variables are observed, conditional dependencies can be used to infer the values of the dependent variables. Hence, the framework of graphical models can be used for diagnostics and a tool for prediction of unobserved variables. In this paper, we consider a specialised instance of graphical models where the random variables are associated with discrete events that occur at certain times of the day, every day of the week. The events might(or not) be associated with a spatial component. Hence, our approach is more generic than the usual spatio-temporal methods. Events being associated with time, can be ordered according to their time of occurrence and when used to build a graphical model, forms a causal network where the probability of an event occurring later in the day is modelled as a function of a set of events that occurred earlier in the day. We can include the effect of static information like demographics, weather etc. by treating them as random variables with the difference that they do not vary with time at the same rate as the events that we model, hence they do not have a temporal component associated with it. BUILDING A DEPENDENCY GRAPH In this section, we describe the method used to build a dependency graph from data. We can illustrate the procedure of building a model of dependency using a simple example of a bus network that consists of 7 arrival events(AE) which are identified by the sequence of numbers from 1 to 7 for the purposes of this example. These arrival events correspond to buses arriving at different bus stops in the network at different times of the day. An arrival event consists of a tuple of a bus stop and the scheduled time of arrival. The objective is then to model the delay at various bus stops using a dependency graph. The delays associated with these 7 arrival events are observed over a number of days for that particular bus network. Using this data we identify the dependency of a target AE to other AEs that preceded the target AE in time. For instance, if we observe that AEs 1,2,3 consistently precede AE 4 in time, we build a dependency relation that maps the delay values of AEs 1,2,3 to the delay of AE 4. Mathematically, we learn a probability distribution of the delay in AE 4 conditioned on the observed delays in 1,2 and 3. It can be expressed mathematically as P(a4|a1,a2,a3), where a1 . . .a4 are random variables corresponding to delay of the respective AEs. We model dependency between AEs using a simple generalized linear relation between the outcome AE and the independent AEs. A generalized linear model is a generalization of linear regression for different noise models. The expected value of the outcome variable a4 would be given by: E(a4|a1,a2,a3,a0) = g(m1a1 +m2a2 +m3a3 +a0), (1) where g is a link function whose form depends on the choice of the linear model. When the noise is assumed to be Gaussian, the link function used would be an identity function. For a Poisson distributed outcome variable, the link would be an exponential function. We could also enrich the model through the use of external variables in the mapping. Examples of extrinsic variables could be weather information, demographic information, the day of the week and so on. While fitting the model to the data we must try to obtain sparse models where many of the coefficients of regression (m1,m2 . . .) are zero. The sparsity of the model ensures that only the most influential dependencies are included in the model, encourages better generalization to previously unseen data and also improves the interpretability of the model by keeping it simple. In view of this requirement, we use a lasso regression(11) to fit a sparse model. Lasso linear regression is a Narayanan Edakunni, Aditi Raghunathan, Abhishek Tripathi, John Handley, Frederic Roulland 4

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Prediction Dependency on Virtual Social Networks Based on Alexithymia, Attachment Styles, Well-Being Psychological and Loneliness

Introduction: Virtual social networks like others type of addiction can be affected by psychological, developmental, and emotional problems. So, the aim of this research is to The purpose of this study was to investigate prediction dependency on virtual social networks based on alexithymia, attachment styles, well-being psychological and loneliness. The research design was a two-group diagnosti...

متن کامل

A Link Prediction Method Based on Learning Automata in Social Networks

Nowadays, online social networks are considered as one of the most important emerging phenomena of human societies. In these networks, prediction of link by relying on the knowledge existing of the interaction between network actors provides an estimation of the probability of creation of a new relationship in future. A wide range of applications can be found for link prediction such as electro...

متن کامل

Prediction of methanol loss by hydrocarbon gas phase in hydrate inhibition unit by back propagation neural networks

Gas hydrate often occurs in natural gas pipelines and process equipment at high pressure and low temperature. Methanol as a hydrate inhibitor injects to the potential hydrate systems and then recovers from the gas phase and re-injects to the system. Since methanol loss imposes an extra cost on the gas processing plants, designing a process for its reduction is necessary. In this study, an accur...

متن کامل

Probabilistic Contaminant Source Identification in Water Distribution Infrastructure Systems

Large water distribution systems can be highly vulnerable to penetration of contaminant factors caused by different means including deliberate contamination injections. As contaminants quickly spread into a water distribution network, rapid characterization of the pollution source has a high measure of importance for early warning assessment and disaster management. In this paper, a methodology...

متن کامل

Prediction in Health Domain Using Bayesian Networks Optimization Based on Induction Learning Techniques

A Bayesian network is a directed acyclic graph in which each node represents a variable and each arc a probabilistic dependency; they are used to provide: a compact form to represent the knowledge and flexible methods of reasoning. Obtaining it from data is a learning process that is divided in two steps: structural learning and parametric learning. In this paper we define an automatic learning...

متن کامل

LPKP: location-based probabilistic key pre-distribution scheme for large-scale wireless sensor networks using graph coloring

Communication security of wireless sensor networks is achieved using cryptographic keys assigned to the nodes. Due to resource constraints in such networks, random key pre-distribution schemes are of high interest. Although in most of these schemes no location information is considered, there are scenarios that location information can be obtained by nodes after their deployment. In this paper,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1508.03130  شماره 

صفحات  -

تاریخ انتشار 2015